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We present algorithmic improvements to the overlap Hybrid Monte Carlo algorithm, including 
preconditioning techniques and improvements to the correction step, used when one of the eigen- 
values of the Kernel operator changes sign, which is now O(Af^) exact. 
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1. Improved Correction Step 

The sign function in the overlap Dirac operator creates a discontinuity —2 d in the pseudo- 
fermion contribution to the action whenever an eigenvalue of the kernel operator changes sign. To 
conserve energy, we integrate up to the computer time where the eigenvalue crosses, and intro- 
duce a discontinuity in the kinetic energy which exactly cancels the jump in the pseudo-fermion 
energy. A general area conserving and reversible update which can do this is: 
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Yl is the original momentum, n+ the final momentum, A is an arbitrary function of the gauge 
field at Tc, T] is a unit vector normal to the A = surface, and the T]j are unit vectors normal to r\. 
The original algorithm set = 4 and dj = 0, and had 0{Tc) eiTors. We can use the dj terms 
to cancel these errors, giving the transmission algorithm: 



n+ = n- + T,(F) - T]T,(T],f ) - |Tr(F) + {^,u-)J\ + ^ 
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d2 = -2T,(f-,T])(n-,T])+2T,(F+,T])(n+,T])+2T,(f,f+-F-). 
f = |(f- + f+) 

where f * are the MD forces immediately before and after the crossing. We cannot use this algo- 
rithm if it would lead to complex n+. In this case, we have to reflect of the A = surface, and 
there will be no topological charge change. Figure |l| shows how the energy difference across the 
correction step varies as a function of At. It clearly shows that the energy has errors of at maximum 
0(At2). 

2. Improved Leapfrog algorithm 

In an alternative leapfrog update for the molecular dynamics part of the HMC is suggested: 

1. n(T+AAT) = n(T)+AATri(T). 

3. n(T + (1 - A)at) = n(T) + (1 - 2A)ATri(T + aat). 

5. n(T + At) = n(T+ (1 - A)At) + AATri(T+ (1 - A)At). 

The optimal value of A is given in [^]. This algorithm has improved energy conservation, which 
more than compensates for the need to invert the overlap operator twice. We have tested it on 4"*, 
8"*, and 12^ lattices, and found gains of around 30% (see section ^. 
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Figure 1: Dependency of the energy on At. The red lines are from top down: (At, At^, At^). 

3. Stout Smearing 

We use the "stout links" proposed in [||]. As mentioned in [Q] this improves the condition 
number of the Wilson operator substantially, thus speeding up the inversions needed to construct 
the overlap operator. We find however that there is a "phase transition" at a critical level of the 
smearing parameter, leading to a sharp increase in the magnitude of the smallest eigenvalue of the 
Wilson operator. This reduces the effectiveness of the smearing. 

4. Hasenbusch acceleration 

Hasenbusch acceleration has been used to speed up dynamical simulations. We introduce an 
additional fermion flavour with a large mass, and by placing the two fermions on different time 
scales we can in principle reduce the number of low mass inversions needed during a trajectory. 
However, we saw little gain when using this method, partly because we were testing on large 
masses, and partly because our overlap operators are usually well conditioned (see section 

5. Overlap eigenmode preconditioning 

In the case of a topological nontrivial configuration, the spectrum of the overlap matrix in- 
cludes a "zero mode". Inversions of the overlap operator become prohibitively expensive when 
simulating in the regime of small quark masses. Our ansatz is to calculate the smallest m eigen- 
vectors and eigenvalues A„, of the overlap operator to a very low precision (e.g. 10^^) and use 
them as a preconditioner for our CG preconditioner in our GMRESR inverter. Our preconditioner 
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Figure 2: Convergence history for the preconditioning method on configuration with trivial topology 

is: 

m V / 

Figures ^ and ^ show the convergence of CG with and without preconditioning using above pro- 
jector. These plots were generated using a 8^ dynamical configuration at mass /i = 0.1, with the 
inversions carried out at mass jU = 0.03. Figure ^ shows the convergence history for the case of 
a configuration with trivial topology; Figure |3| shows the convergence history for a configuration 
with a "zero mode" induced by topology: Clearly in the latter case the preconditioning offers great 
possible gains, which — according to our experience — increase with the volume and decreasing 
of the masses. 

In an HMC simulation, using the previous eigenvectors as a starting point for the next eigen- 
value calculation can dramatically reduce the time needed, although it is unclear how large an effect 
this leads has on the reversibility of the MD. 
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Figure 3: Convergence history for the preconditioning method on configuration with non trivial topology 



6. Non area conserving correction step 

It is possible to use a non area conserving molecular dynamics update by including the Jaco- 
bian in the Metropolis accept/reject step^. The detailed balance condition reads: 

P[U' ^U]Wc[U] = y"dndn'exp-5n'5([[/,n]-rMZ)[t/',n'])min(l,exp^)Wc[t/] 

= j dndn'exp-^n'5([[/',-n']-r^^[[/,-n])^|^min(l,exp^)Wc[[/] 
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dndn' exp- ' 5 ( _n'] - r^^ -n] ) min ( 1 , exp-^) Wc [V] 
P[U ^ U']Wc[U'] 
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du,n 



The most general transmission update which is reversible and conserves A is: 
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For ro = °°, this gives the usual area conserving transmission formula equation One has to 

reflect if the transmission formula gives a complex (n+,T]). By tuning ro, we can improve the 
transmission rate. The results displayed in the tables of section ^ were obtained using ro = 1, and 
give a 50% improvement in the transmission rate. 



We thank A. Borici for pointing this out to us 
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7. Results 

In this section we summarise the results referred to in the previous sections. 



Type 


time 


Acc 




trans./traj. 


refl./traj. 




normal 


1897(60) 


94% 


40 


0.0738(240) 


1.348(100) 


325 


has 


1986(20) 


88% 


40 


0.0521(311) 


1.059(94) 


307 


imp 


1420(10) 


94% 


15 


0.0535(233) 


0.876(98) 


299 


imphas 


1594(40) 


75% 


15 


0.0772(336) 


1.093(118) 


324 


impnap 


1480(10) 


95% 


15 


0.117(34) 


1.336(136) 


310 


impnaphas 


1611(60) 


78% 


15 


0.110(21) 


0.832(159) 


155 



}X = 0.05 



Type 


time 


Acc 


f^md 


trans./traj. 


refl./traj. 




normal 


1816(20) 


95% 


40 


0.447(64) 


0.938(80) 


465 


has 


2100(90) 


90% 


40 


0.569(65) 


0.880(65) 


374 


imp 


1479(20) 


96% 


15 


0.371(43) 


0.947(62) 


533 


imphas 


1470(60) 


90% 


15 


0.413(53) 


0.531(76) 


518 


impnap 


1445(50) 


95% 


15 


0.674(89) 


1.818(147) 


209 


impnaphas 


N/A 


94% 


15 


0.663(69) 


1.370(114) 


281 



At =0.2 



In these tables "has" denotes Hasenbusch acceleration, "imp" denotes usage of the precondi- 
tioner and "nap" refers to the non area preserving update. 
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